Comparing Knowledge-Based Sampling to Boosting
نویسنده
چکیده
Boosting algorithms for classification are based on altering the initial distribution assumed to underly a given example set. The idea of knowledge-based sampling (KBS) is to sample out prior knowledge and previously discovered patterns to achieve that subsequently applied data mining algorithms automatically focus on novel patterns without any need to adjust the base algorithm. This sampling strategy anticipates a user’s expectation based on a set of constraints how to adjust the distribution. In the classified case KBS is similar to boosting. This article shows that a specific, very simple KBS algorithm is able to boost weak base classifiers. It discusses differences to AdaBoost.M1 and LogitBoost, and it compares performances of these algorithms empirically in terms of predictive accuracy, the area under the ROC curve measure, and squared error.
منابع مشابه
Comparing Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation
High dimensionality and class imbalance are two main problems that affect the quality of training datasets in software defect prediction, resulting in inefficient classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original data set. Data sampling alters the data s...
متن کاملModeling Dynamic Systems with Efficient Ensembles of Process-Based Models
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of ...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملComparing the effectiveness of SMS and lectures on the job training for nurses
Introduction: Due to difficulties in nursing education by teaching methods, and the uncertainty of the effectiveness of teaching by SMS, this study Comparing the effectiveness of SMS and lectures on the job training for nurses. Methods: This study was a randomized intervention study in 2014. The samples in this study were selected by cluster sampling among the nurses in hospitals of Arak. Then ...
متن کاملA New Adaptive Sampling Method for Scalable Learning
Scaling up data mining algorithms to handle huge data sets is an important issue in machine learning and knowledge discovery. Random sampling is often used to achieve better scalability in learning from massive amount of data. Adaptive sampling offers advantages over traditional batch sampling methods in that adaptive sampling often uses much lower number of samples and thus better efficiency w...
متن کامل